An Operand Status Based Instruction Steering Scheme for Clustered Architectures

نویسندگان

  • Yukinori Sato
  • Ken-ichi Suzuki
  • Tadao Nakamura
چکیده

Clustered architectures which intend to process data within a localized PE are one of the approaches to increase the performance under the difficulties of the wire delay problems. The performance of the clustered architecture depends on the implemented instruction steering scheme. Existing steering schemes insert inter-PE communications to achieve load balance among PEs. These insertions delay the executions of the dependent instructions and lead to the degradation of the performance. In this paper, we propose a novel instruction steering scheme, which gives priority to critical dependencies. The way to find out the critical dependencies is by observing the status of the source operands of an instruction. We evaluate the proposed scheme and compare it with the existing ones. The results show that the proposed scheme outperforms the existing schemes in terms of instruction per clock because of reductions of the critical inter-PE communications with superior load balance among the PEs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiler-assisted power optimization for clustered VLIW architectures

Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler ...

متن کامل

Improving Dictionary-Based Code Compression in VLIW Architectures

Reducing code size is crucial in embedded systems as well as in high-performance systems to overcome the communication bottleneck between memory and CPU, especially with VLIW (Very Long Instruction Word) processors that require a high-bandwidth instruction prefetching. This paper presents a new approach for dictionary-based code compression in VLIW processor-based systems using isomorphism amon...

متن کامل

Pragmatic integrated scheduling for clustered VLIW architectures

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Scheduling for clustered architectures involves spatial concerns (where to schedule) as well as temporal concerns (when to schedule). Various clustered VLIW configurations, connectivity types, and inter-cluste...

متن کامل

PALF: compiler supports for irregular register files in clustered VLIW DSP processors

Wide varieties of register file architectures — developed for embedded processors — have turned to aim at reducing the power dissipation and die size these years, by contrast with the traditional unified register file structures. This article presents a novel register allocation scheme for a clustered VLIW DSP, which is designed with distinctively banked register files in which port access is h...

متن کامل

Code generation for a Coarse-Grained Reconfigurable Architecture

Good tool support is essential for computing platforms because they increase programmability. This is especially the case for reconfigurable architectures because applications need to be mapped on the architecture for each configuration individually. This paper introduces a compiler backend for Coarse Grained Reconfigurable Arrays (CGRA) based on LLVM. The CGRA compiler must be retargetable to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005